Semantic Atomicity and Multilinguality in the Medical Domain: Design Considerations for the MorphoSaurus Subword Lexicon

نویسندگان

  • Stefan Schulz
  • Kornél G. Markó
  • Philipp Daumke
  • Udo Hahn
  • Susanne Hanser
  • Percy Nohama
  • Roosewelt L. Andrade
  • Edson José Pacheco
  • Martin Romacker
چکیده

We present the lexico-semantic foundations underlying a multilingual lexicon the entries of which are constituted by so-called subwords. These subwords reflect semantic atomicity constraints in the medical domain which diverge from canonical lexicological understanding in NLP. We focus here on criteria to identify and delimit reasonable subword units, to group them into functionally adequate synonymy classes and to relate them by two types of lexical relations. The lexicon we implemented on the basis of these considerations forms the lexical backbone for MORPHOSAURUS, a cross-language document retrieval engine for the medical domain.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The MORPHOSAURUS Medical Subword Lexicon – Lexicographic and Semantic Aspects

For technical sublanguages such as the medical one, document indexing based on lexical entities at a subword level has proved useful. However, it still remains challenging to identify and to delimit the meaningful lexical entities, as well as to group them in synonymy classes. We present a lexicographic and semantic foundation underlying the multilingual MORPHOSAURUS lexicon. Resumo. Para lingu...

متن کامل

A Medical Multilingual Information Retrieval

The Web is full of documents and resources. Users employ different strategies to find information they need: by browsing, using search engines, by following existing categories in a Web catalog. For technical sublanguages such as the medical one, document indexing based on lexical entities at a subword level has proved useful. However, it still remains challenging to identify and to delimit the...

متن کامل

Morphosaurus in ImageCLEF 2006: The Effect of Subwords on Biomedical IR

We here describe the subword approach we used in the 2006 ImageCLEF Medical Image Retrieval task. It is based on the assupmtion that neither fully inflected nor automatically stemmed words constitute the appropriate granularity for lexicalized content description. We therefore introduce subwords as morphologically meaningful word units. Subwords are organized in language specific lexica that we...

متن کامل

A Supervised Method for Constructing Sentiment Lexicon in Persian Language

Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006